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IN VIVO CONSTRUCTION OF DNA LIBRARIES 
FieU of the Invention 

5 

This invention relates to an intracellular method for making DNA libraries. 
Background of the Invention 

10 A cDNA library is a collection ofcloned DNA molecules propagated in an 

appropriate host. It is usually derived fiom the mRNA population of a particular 
cell, tissue or organ by reverse transcription, cloned into a vector molecule and 
propagated in an expropriate host cell. 

cDNA libraries are useful in numerous applications. cDNA libraries can be 

15 used to isolate and identify cell-specific expressed sequences. AcDNAclone 

isolated from a library can be sequenced and translated (e.g., by computer programs) 
to derive the primary amino acid sequence of the encoded protein or can be used as a 
labeled probe to investigate gene expression in vivo. 

cDNA libraries can also be used in a two-hybrid assay to screen a large 

20 number of candidate proteins and identify those which interact with a particular 
target proteiiL In this approach, cDNAs are incorporated into activation domain 
vectors to provide random proteins fiised to an activation domain of a known 
transcription factor. Vectors encoding the target protein fiised to the DNA binding 
domain of the transcription factor, and the library of activation domain hybrids are 

25 cotransformed into a reporter strain. Interaction of the target protein moiety of a 
target protein DNA binding domain fiision protein with a protein encoded by cDNA 
brings the DNA binding domain into proximity with the activation domain fiised to 
the cDNA encoded protein. The resulting transcription identifies a positive clone. 
Once a positive clone has been identified, the gene corresponding to the interacting 

30 protein can be isolated and analyzed. 

The use of cDNA libraries has become increasingly widespread and, as a 
result, the need for methods which allow the rapid construction of cDNA libraries in 
vectors ^ropriate for particular plications is imperative. 
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Summawy of the Invention 

In geaeral, the invention features, a method for constructing a DNA Kbrary, 
e.g., a cDNA library, in vivo. The method includes: 
5 providing a plurality of host cells; 

providing a vector having a first region and a second region; 
providing a plurahty of nucleic acid insert molecules having a first conmion 
region which is homologous with the first region of the vector, a second common 
region which is homologous with the second region of the vector, and a library 
10 element encoding region disposed between the first common region and the second 
common region, wherein when the Ubraiy element mcoding region encodes a 
naturally occurrmg sequence, the first and second common regions are not naturally 
found adjacent to the library element encoding region (the term "common" means 
that each molecule of the plurality includes the common sequence); 
1 5 mtroducing a vector molecule into each of the host cells; 

introducing a nucleic acid msert molecule into each of the cells, wherem a 
different Ubrary element encoding region is introduced into each of the cells; and 

allowing homologous recombination and gap repair between the vector 
molecule and the nucleic acid insert molecule to occur, thereby constructing a DNA 
20 library. 

In preferred embodiments, the DNA library can be a cDNA library, a 
genomic DNA library, or a synthetic DNA library. 

In preferred embodiments, homologous recombination and gap repair occurs 
between tiie vector molecule and the nucleic acid insert molecule. 
25 In preferred embodiments, the first and the second conunon regions can be 

. the same or can be different. T be^first and the second common reg ions can be^ or 
^^ pgrtofahnker used for the creation of an existmg cDNA library, or they can be all 

or part of a site the library element encoding region had been inserted in. For 

example, the first and the second common regions can be all or part of a vector, 
30 e.g., all or part of a polylinker region, or part of a naturally occurring sequence 
existing adjacent to the library client encoding region, e.g., all or part of a gene, 
such as a conserved sequence within a gene, e.g., a zinc finger motif, a helix loop 
helix motif; or a WW domain. 

In preferred embodiments, the second region of the first and the second 
35 primers can be the same or can be di£ferent The second region of the first and the 
second primers can be homologous to a vector sequence, e.g., a polylinker site or a 
sequence which flanks the insertion site, or can be homologous to a sequence in a 
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diflferent nucleic acid insert molecule, e.g., a nucleic acid insert molecule intended to 
be part of a &ia l construct indu ding a plural ity of nucleic acid insert mo lecules. For 
example the second region of flie first and the second primers can behomologdus to 
a restriction enzyme cleavage site, e.g., a JVb^ I, an EcoR I, or a Hind IR cleavage 
5 site. 

In preferred embodiments, the second region of the first primer is 5' to the 
first region of the primer. In preferred embodiments, the second region of the 
second primer is 3* to the first region of the primer 

In preferred embodiments, flie host cell can be a yeast cell, e.g., a 

10 Saccharomyces cerevisiae or Schizosaccharomyces pombe cell, a bacterial cell, e.g., 
an E, coli cell, such as, for example, the E. coli strains CJ236, NM522, 5K, 
TGE7300, JMlOl, JM107, KM392 or LE392, or a mammalian cell, such as, for 
example, a CHO, COS, C127, or a HepG2 cell. 

In preferred embodiments, the vector can be linearized prior to being 

15 introduced into the host cell. For example, the vector can be linearized by cleaving 
between the first and second regions of the vector. Examples of vectors which can 
be used in the methods of the invention include X^gtlO, A,gtl 1, the ZAP series vectors 
(Stratagene), pESP-1, pOPRSVlMSC, pGAD.GH, pVP16, pACT, pGAD424, 
pGAD2F,orpJG4-5. 

20 In preferred embodiments, the second region of the nucleic acid insert 

molecule is produced by PGR, using primers having a first region which is 
homologous to the 3' end of the element encoding region and a second region which 
is homologous to the second region of the vector. In preferred embodiments, the 
first region of the nucleic acid insert molecule is produced by PGR, using primers 

25 having a first region which is homologous to the 5' end of the element encoding 
region and a second region which is homologous to the first region of the vector. 

In preferred embodiments, the second region of the nucleic acid insert 
molecule is produced by the ligation of ad^ters having a sequence homologous to 
the second region of the vector. In preferred embodiments, the first region of the 

30 nucleic acid insert molecule is produced by the ligation of adapters having a 
sequence homologous to the first region of the vector. 

In preferred embodiments, the first and second regions of the nucleic acid 
insert molecule can be at least 20, 30, 40, 50, 60 or more base pairs in lengtti. In 
preferred embodiments, Ihe first and second common sequences of the nucleic acid 

35 insert molecule can be at least 20, 30, 40, 50, 60 or more base pairs iri length. 

In preferred embodiments, the library element encoding region can be 
obtained firom an existing cDNA library, e.g., a plasmid based cDNA library or a 
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phage based cDNA library; an mRNA molecule, e.g., an mRNA molecule derived 
fiom a tissue, e.g., a cancerous tissue, such as, for example, prostate cancer tissue; or 
a DNA molecule, e.g., a naturally occurring DNA molecule or a synthetic DNA 
molecule. The library element encoding region can be a gene or a part thereof, for 
5 example, a promoter, a protein encoding region, a translational terminator or a 
transcriptional terminator; or an intragenic sequence, e.g., an intragenic sequence 
which encodes, for example, a transcriptional enhancer or silencer. In preferred 
embodiments, the library element encoding region is obtained from a few cells, e.g., 
less than 10, 100, or 1,000 cells (i.e. which contain less than 100, 1,000, or 10,000 
10 pgofRNA. 

In preferred embodiments, the vector further includes an element encoding a 
detectable agent, e.g., a member of a binding pair, e.g., a member of a ligand/ 
counter-ligand pair, an antigen, a detectable enzyme, e.g., a beta-galactosidase, an 
alkaline phosphatase, a horseradish peroxidase, or a luciferase gene, which is, for 

15 exanq)Ie, fijsed with the library element encoding region, such that the library 
element encoding region can be detected. 

In preferred embodiments, the DNA library can be screened in a two-hybrid 
system or it can be used for screening and cloning novel genes. In preferred 
embodiments, the vector can include a transcription factor activation domain and the 

20 method can furdier include introducing into the host cell a nucleic acid molecule 
encoding a hybrid protein, wherein the hybrid protein comprises a transcription 
factor DNA-bindmg domain attached to a test protein; introducing into the host cell 
a detectable gene, wherein the detectable gene comprises a regulator site recognized 
by the DNA-binding domain and wherein the detectable gene expresses a detectable 

25 protein when the test protein interacts with a protein encoded by the DNA library; 
plating the host cell onto selective media; and selecting for the host cell containing a 
DNA encoded protein which interacts with test protein. 

In another aspect, the invention features, a method of preparing a plurality of 
nucleic acid insert molecules. The method includes: 

30 providing a plurality of nucleic acid molecules wherein each of the nucleic 

acid molecule includes, in order from 5' to 3', a first connnon sequence, a library 
element encoding region, and a second common sequence (the term "common" 
means tfiat each molecule of the plurality includes the common sequence); 

providing a plurality of first primers, each of the first primers having a first 

35 region homologous with the first common sequence of the nucleic acid molecule and 
having a second region which is not homologous wifli the first (and preferably 
second) common sequence; 
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providing a plurality of second prim^ each of the second prim^ having a 
first region homologous with the second common sequence of the nucleic acid 
molecule and having a second region which is not homologous with the second (and 
preferably first) connmon sequence; 
S fonning a reaction mixture which includes the plurality of nucleic acid 

molecules, the plurality of the first primers, and the plurality of the second primers, 
under conditions which provide, e.g., by primer directed synthesis, a plurality of 
nucleic acid insert molecules having the following structure, in order from 5' to 3', a 
second region of the first primer/the first common region/a library element encoding 
10 region/the second common region/a second region of the second primer, thereby 
preparing a plurality of nucleic acid insert molecules. 

M preferred embodiments, the first and the second conunon sequences can 

be the same or can be different The first and the second com mon sequences can be 

all or part of a linker used for the creation of an existing cDNA hbrary, or they can 

- • ^ 

15 be all or part of a site the library el^ent encoding region had been inserted in. For 

example, the first and the second common sequeaices can be all or part of a vector, 

e.g., all or part of a polylinkCT region, or part of a naturally occurring sequence 

existing adjacent to the library element encoding region, e.g., all or part of a gene, 

such as a conserved sequence within a gene, e.g., a zinc finger motif, a helix loop 

20 helix motif, or a WW domain. 

In preferred embodiments, the second region of the first and the second 
primers can be the same or can be different. The second region of the first and the 
second primers can be homologous to a vector sequence, e.g., a polylinker site or a 
sequence which flanks the insertion site, or can be homologous to a sequence in a 

25 different nucleic acid insert molecule, e.g., a nucleic acid insert molecule intended to 
be part of a final construct including a plurality of nucleic acid insert molecules. For 
example the second region of ttie first and the second primers can be homologous to 
a restriction enzyme cleavage site, e.g., a Not I, an EcoK I, or a Hind HI cleavage 
site. 

30 &i preferred embodiments, the second region ofdie first primer is 5* to the 

first region of the primer. In preferred embodiments, the second region of ttie 
second primer is 3* to the first region of the primer. 

In preferred embodiments, the second region of the nucleic acid insert 
molecule is produced by PCR, using primers having a first region which is 

35 homologous to the 3' end of the element encoding region and a second region which 
is homologous to the second region of the vector. In preferred embodiments, the 
first region of the nucleic acid insert molecule is produced by PCR, using primers 
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15 



20 



25 
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having a first region which is homologous to the 5* end of the element encoding 
region and a second region which is homologous to the first region of the vector. 

In preferred embodimmts, the second region of the nucleic acid ins^ 
molecule is produced by the ligation of adapters having a sequence homologous to 
the second region of the v^^ "Epreferred'gmbadimentsrthe fli^'region of the 
nuc&c acid insert molecule is produced by the ligation of adapters having a 
sequence homologous to the first region of the vector. 

In preferred embodiments, the first and second regions of the nucleic acid 
insert molecule can be at least 20, 30, 40, 50, 60 or more base pairs in length. In 
pref erred onbodimen ts. the fi rst and second common sequences of the nucleic acid 
insgtmolecule can be at least 20, 30, 40, 50, 60 or more base pairs in length. 



In preferred embodiments, the library element encoding re^on^iTbS 
obtained fix)m an existing cDNA Kbraiy, e.g., a plasmid based cDNA library or a 
phage based cDNA library; an mRNA molecule, e.g., an mRNA molecule derived 
fi-om a tissue, e.g., a cancerous tissue, such as, for example, prostate cancer tissue; or 
a DNA molecule, e.g., a naturally occurring DNA molecule or a synthetic DNA 
molecule. The library element encoding region can be a gene or a part thereof, for 
example, a promoter, a protein encoding region, a translational terminator or a 
transcriptional terminator, or an intragenic sequence, e.g., an intragenic sequence 
which encodes, for example, a transcriptional enhancer or silencer. In preferred 
embodiments, the library element ^coding region is obtained fix>m a few cells, e.g., 
less than 10, 100, or 1,000 cells (i.e. which contain less than 100, 1,000, or 10,000 
pgofRNA. 

In another aspect, the invention features, a method of constructing a DNA 
Hbrary, e.g., a cDNA library. The method includes: 

providing a plurality of nucleic acid molecules whorein each of the nucleic 
acid molecule includes, m order from 5* to 3', a first common sequence, a library 
element encoding region, and a second common sequence (the term "common" 
means that each molecule of the plurality includes the common sequence); 

providing a plurality of first primers, each of the first primers having a first 
region homologous with the first common sequence of the nucleic acid molecule and 
having a second region which is not homologous with the first (and preferably 
second) common sequence; 

providing a plurality of second primers, each of the second primers having a 
first region homologous with the second common sequence of flie nucleic acid 
molecule and having a second region which is not homologous with the second (and 
preferably first) common sequence; 
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fomiing a reaction mixture which includes the plurality of nucleic acid 
molecules, fhe plurality of the first piiiners, and the plurality of the second primers, 
under conditions which provide, e.g,, by primer directed synthesis, a plurality of 
nucleic acid insert molecules having the following structure, in order bom 5' to 3', a 
5 second region of the first primer/the first common region/a library element ^coding 
region/the second common region/a second region of the second primer; 
providing a plurality of host cells; 

providing a vector having a first region which is homologous with the 
second region of the first primer, and a second region which is homologous with the 
1 0 second region of the second primCT; 

introducing a vector molecule into each of the host cells; and 
introducing one or more of the nucleic acid insert molecules into each of the 
cells under conditions which allow for recombination and gap repair^ thereby 
providing a DNA hbrary. 
15 In preferred embodiments, the DNA library can be a cDNA library a 

genomic DNA library, or a synthetic DNA library. 

In preferred embodiments, homologous recombination and gap repair occurs 
between the vector molecule and the nucleic acid insert molecule. 

In preferred embodiments, the first and the second common sequences can 
20 be the same or can be different. The first and the second common sequences can be 
aU^^pmi^^ they can 

be all or part of a site the library element encoding region hadbemins^ted in. For 
'nKcaEDpleTthe first and the second common sequences can be all or part of a vector, 
e.g., all or part of a polylinker region, or part of a naturally occurring sequence 
25 existing adjacent to the library element encoding region, e g., all or part of a gene, 
such as a conserved sequence within a gene, e.g., a zinc finger motif, a helix loop 
helix motif, or a WW domain. 

In preferred embodiments, the second region of the first and the second 
primers can be the same or can be different. The second region of the first and the 
30 second primers can be homologous to a vector sequence, e.g., a polylinker site or a 
sequence which flanks the insertion site, or can be homologous to a sequence in a 
different nucleic acid insert molecule, e.g., a nucleic acid insert molecule intended to 
be part of a final construct including a plurality of nucleic acid insert molecules. For 
example the second region of the first and the second primers can be homologous to 
35 a restriction enzyme cleavage site, e.g., a Not I, an EcoR I, or a Hind m cleavage 
site. 
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In preferred embodiments, the second region of the first primer is 5' to the 
first region of the primer. In preferred embodiments, the second region of the 
second primer is 3' to the first region of the primer. v 

In preferred embodiments, the host cell can be a yeast cell e. ^- a 
5 Saccharomyces cerevisiae or Schizosaccharomyces pombe cell, a bacterial cell, e.g., 
an E. coli cell, such as, for example, the E, coli strains 0236, NM522, 5K, 
TGE7300, JMlOl, JM107, KM392 or LE392, or a mammalian cell, such as, for 
example, a CHO, COS, C127, or a Hq)G2 cell. 

In preferred embodiments, the vector can be linearized prior to being 

10 introduced into the host cell. For example, the vector can be linearized by cleaving 
between the first and second regions of the vector. Examples of vectors which can 
be used in the methods of the invention include XgtlO, Xgtl 1, the ZAP series vectors 
(Stratagene), pESP-1, pOPRSVlMSC, pGAD.GH, pVP16, pACT, pGAD424, 
pGAD2F,orpJG4-5. 

15 In preferred mibodimehts, the second region of the nucleic acid insert 

molecule is produced by PGR, using primers having a first region which is 
homologous to the 3' end of the element encoding region and a second region which 
is homologous to the second region of the vector. In preferred embodiments, the 
first region of the nucleic acid insert molecule is produced by PGR, using primers 

20 having a first region which is homologous to the 5' end of the element encoding 
region and a second region which is homologous to the first region of the vector. 

In preferred embodiments, the second region of the nucleic acid insert 
molecule is produced by the ligation of adapts having a sequence homologous to 
the second region of the vector. In preferred embodiments, the first region of the 

25 nucleic acid insert molecule is produced by the ligation of adapters having a 
sequence homologous to the first region of the vector. 

In preferred embodiments, the first and second regions of the nucleic acid 
insert molecule can be at least 20, 30, 40, 50 , 60 or more base pairs in length. In 
preferred embodiments, the first and second common sequences of the nucleic acid 

30 insert molecule can be at least 20, 30, 40, 50, 60 or more base pairs in length. 

In preferred embodiments, the library element encoding region can be 
obtained firam an existing cDNA library, e.g., a plasmid based cDNA library or a 
phage based cDNA library; an mRNA molecule, e.g., an mKNA molecule derived 
fix>m a tissue, e.g., a cancerous tissue, such as, for example, prostate cancer tissue; or 

35 a DNA molecule, elg., a natur^y occurring DNA molecule or a synthetic DNA 
molecule. The library element encoding region can be a gene or a part tiiereof, for 
example, a promoter, a protein encoding region, a translational terminator or a 
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transcriptional terminator, or an intragenic sequence, e.g., an intragenic sequence 
which encodes, for example, a transcriptional enhancer or silencer. In preferred 
embodiments, the library element encoding region is obtained fiom a few cells, e.g., 
less than 10, 100, or 1,000 cells (i.e. which contam less than 100, 1,000, or 10,000 
5 pgofRNA. 

In preferred embodimentis, the vector further includes an element encoding a 
detectable agent, e.g., a member of a binding pair, e.g., a member of a ligand/ 
coimter-ligand pair, an antigen, a detectable enzyme, e.g., a beta-galactosidase, an 
alkaline phosphatase, a horseradish peroxidase, or a luciferase gene, which is, for 

1 0 example, fused with the library element encoding region, such that the library 
element encoding region can be detected. 

In preferred embodiments, the DNA library can be screened in a two-hybrid 
system or it can be used for screening and cloning novel genes. In prefmed 
embodiments, the vector can include a transcription factor activation domain and the 

1 5 method can further include mtroducing into the host cell a nucleic acid molecule 
encoding a hybrid protein, wherein the hybrid protein comprises a transcription 
factor DNA-binding domain attached to a test protein; introducing into the host cell 
a detectable gene, wherein the detectable gene comprises a regulator site recognized 
by the DNA-binding domain and wherein the detectable gene expresses a detectable 

20 protein when the test protein interacts with a protein mcoded by the DNA library; 
plating the host cell onto selective media; and selecting for the host cell containing a 
DNA encoded protein which interacts with test protein. 

In another aspect, the invention features, a method of constructing a DNA 
library, e.g., a cDNA hbrary, to be screened in a two-hybrid system. The method 

25 includes: 

providing a plurahty of nucleic acid molecules wherein each of the nucleic 
acid molecule includes, in order from 5' to 3', a first common sequence, a Ubrary 
elemait encoding region, and a second common sequence (the term "common" 
means that each molecule of the plurahty includes tiie common sequence); 

30 providing a plurality of first primers, each of the first primers having a first 

region homologous with the first common sequence of the nucleic acid molecule and 
having a second region which is not homologous with the first (and preferably 
second) common sequence; 

providing a plurality of second primers, each of the second primers having a 

35 first region homologous with the second common sequence of the nucleic acid 

molecule and having a second region which is not homologous with the second (and 
preferably first) common sequence; 
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forming a reaction mixture which includes the plurality of nucleic acid 
molecules, the plurality of the first primers, and the plurality of the second primers, 
under conditions which provide, e.g., by primer directed synthesis, a plurality of 
nucleic acid insert molecules having the following structure, in order fiom 5* to 3', a 
5 second region of the first primer/the first common region/a library element encoding 
region/the second common region/a second region of the second primer, 
providing a plurality of host cells; 

providing a vector having a first region which is homologous with the 
second region of the first primer, and a second region which is homologous with the 
10 second region of the second primer, wherein the vector fiirther includes a 
transcription factor activation domain; 

introducing a vector molecule into eaoh of the host cells; 

introducing one or more of the nucleic acid insert molecules mto each of the 
cells under conditions which allow for recombination and gap repair to occur, 
IS introducing into the host cell a nucleic acid molecule ^coding a hybrid 

protein, wherein the hybrid protein includes a transcription factor DNA-binding 
domain attached to a test protein; 

introducing into the host cell a detectable gene, wherein the detectable gene 
comprises a regulator site recognized by the DNA-binding domain and wherein the 
20 detectable gene expresses a detectable protein when the test protein interacts with a 
protein encoded by the DNA Ubrary; 

plating the host cell onto selective media; and 

selecting for the host cell containing a DNA encoded protein which interacts 
widi test protein. 

25 Li preferred embodimoits, the DNA library can be a cDNA hbrary a 

genomic DNA library, or a synthetic DNA library. 

In preferred embodiments, homologous recombination and gzp repair occurs 
between the vector molecule and the nucleic acid insert molecule. 

In preferred embodiments, die first and the second common sequences can 

30 be the same or can be different The first and the second common sequences can be 
all or part of a linker used for the creation of an existing cDNA library, or they can 
be all or part of a site the Ubrary element encoding region had been inserted in. For 
example, the first and the second common sequences can be all or part of a vector, 
e.g., all or part of a polyUnker region, or part of a naturally occurring sequrace 

35 existing adjacent to the library elemmt encoding region, e.g., all or part of a gene, 
such as a conserved sequence within a gene, e.g., a zinc finger motif, a helix loop 
helix motif, or a WW domain. 
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In preferred embodiments, the second region of the first and the second 
primers can be the same or can be different The second region of the first and the 
second primers can be homologous to a vector sequence, e.g., a polylinker site or a 
sequence which flanks the insertion site, or can be homologous to a sequence in a 
5 different nucleic acid insert molecule, e.g., a nucleic acid insert molecule intended to 
be part of a final construct including a plurality of nucleic acid insert molecules. For 
example the second region of the fu^t and the second primers can be homologous to 
a restriction enzyme cleavage site, e.g., a Not I, an EcdSi I, or a Hind m cleavage 
site. 

10 In preferred embodiments, the second region of the first primer is 5' to the 

first region of the primer. In preferred embodiments, the second region of the 
second primer is 3' to the first region of the primer. 

In preferred embodiments, the host cell can be a yeast cell, e.g., a 

Sdcc haromyces cerevisiae or Schiz osaccharomyces pombe ce ll.. 

IS In preferred embodimoits, the vector can be linearized prior to being 

introduced into the host cell. For example, the vector can be linearized by cleaving 
between the first and second regions of the vector. Examples of vectors which can 
be used in the methods of the invention include the "activation domain** vectors: 
pGAD.GH, pVP16, pACT, pGAD424, pGAD2F, or pJG4-5. 

20 In preferred embodiments, the second region of the nucleic acid insert 

molecule is produced by PGR, using primers having a first region which is 
homologous to the 3' end of the element encoding region and a second region which 
is homologous to the second region of the vector. In preferred embodiments, the 
first region of the nucleic add insert molecule is produced by PGR, using primm 

25 having a first region which is homologous to the 5' end of the element encoding 
region and a second region which is homologous to the first region of the vector. 

In preferred embodiments, the second region of the nucleic acid insert 
molecule is produced by the Ugation of adapters having a sequence homologous to 
the second region of the vector. In preferred embodiments, the first region of the 

30 nucleic acid insert molecule is produced by the ligation of adapters having a 
sequence homologous to the first region of the vector. 

bi preferred embodiments, the first and second regions of the nucleic acid 
insert molecule can be at least 20, 30, 40, SO, 60 or more base pairs in length. In 
preferred embodiments, the first and second common sequences of the nucleic acid 

3S insert molecule can be at least 20, 30, 40, SO, 60 or more base pairs in length. 

In preferred embodiments, the library element encoding region can be 
obtained fipom an existing cDNA library, e.g., a plasmid based cDNA library or a 
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phage based cDNA library; an mRNA molecule, e.g., an mRNA molecule derived 
fiom a tissue, e.g., a cancerous tissue, such as, for example, prostate canc^ tissue; or 
a DNA molecule, e.g., a naturally occurring DNA molecule or a synthetic DNA 
molecule. The library element encoding region can be a gene or a part thereof, for 
example, a promoter, a protein encoding region, a translational terminator or a 
transcriptional terminator; or an intragenic sequence, e.g., an intragenic sequence 
which encodes, for example, a transcriptional enhancer or silencer. In preferred 
embodiments, the library element encoding region is obtained from a few cells, e.g., 
less than 10, 100, or 1,000 cells (i.e. which contain less than 100, 1,000, or 10,000 
pgofRNA. 

In preferred embodiments, the vector fiulfaer includes an element encoding a 
detectable agent, e.g., a member of a binding pair, e.g., a member of a ligand/ 
counter-ligand pair, an antigen, a detectable enzyme, e.g., a beta-galactosidase, an 
alkaline phosphatase, a horseradish peroxidase, or a luciferase gene, which is, for 
example, fiised with the hbrary element encoding region, such that the library 
element encoding region can be detected. 

In another aspect, the invention features, a kit allowing the interchangeable 
use of a DNA library in more than one application, e.g., for easy and rapid transfer 
of a hbrary insert from a first vector to a second vector. The kit includes one or 
more of the primers described herein, e.g., a plurality of first oligonucleotide 
primers, each of the first primers having a first region homologous with a first region 
common to all inserts, e.g., all or part of a linker used in the construction of the 
DNA library in the first vector, and a second region homologous with a first region 
of a second vector; a plurality of second oligonucleotide primers, each of the second 
primers having a first region homologous with a second region conunon to all 
inserts, e.g., all or part of a linker used in the constraction of the DNA library in the 
first vector, and a second region homologous with a second region of a second 
vector; and optionally any of a reaction buffer, or DNA enzyme, e.g., a ligase or a 
restriction endonuclease, and instructions for use. 



In preferred embodiments the H^includes one or more of: the library, e.g., a 
cDNA library; the Hbrary inserted into a first vector, the second vector into which 
the library is to be inserted. 

In another aspect, the invention features, an oligonucleotide primor described 
herein, e.g., an oligonucleotide primer having a first region homologous with a 
linker sequence used in the construction of a DNA library, and a second region 
homologous with an insertion region of a vector required for a second sqiplicatioiL 
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In another aspect, fhe invention features, a method for screening a subject for 
the existmce of a lesion in a gene encoding a preselected protein. The method 
includes: obtaining a tissue sample fiom the subject; 

preparing fiom the tissue, a plurality of nucleic acid insert molecules having 
5 a first common region, a library element encoding region and a second common 
region, wherein the Ubrary element encoding region encodes the protein or portion 
thereof (the term "common" means that each molecule of the plurality includes the 
common sequence); 

providing a vector having a first region which is homologous to the first 
10 common region of the nucleic acid insert molecule and a second region which is 
homologous to the second common region of the nucleic acid insert molecule, 
wherein the vector is suitable for use in an assay which detects the interaction 
between two proteins; 

providing a host cell suitable for use in an assay which detects the interaction 
15 between two proteins; 

introducing into the host cell the nucleic acid insert molecule, and the vector 
and performing the assay which detects the interaction between two proteins, 
thereby screening subjects for the existence of lesions in the gene encoding the 
protein. 

20 In preferred embodiments, the plurality of the nucleic acid insert molecules 

can be prepared by PGR using a first and a second primer, the first primer having a 
first region including the first region of the nucleic acid insert molecule and a second 
region homologous with a sequence in the library element encoding region, and the 
second primer having a first region including the second region of the nucleic acid 

25 insert molecule and a second region homologous with a sequence in the Ubrary 
element encoding region. In preferred embodiments, the assay which detects the 
interaction between two proteins can be a two-hybrid assay. 

As used herein, the term "homologous recombination" refers to the process 
by which a DNA molecule can recombine (cross over) into a homologous sequence 

30 in another DNA molecule m, for example, a host cell. Homologous recombination 
can be catalyzed by enzymes called recombinases. Examples of recombinases 
include RecA, RecBCD, RAD5 1, or DMCl . Homologous recombination occurs 
fi:equently in bacteria, yeast, and certain viruses, as well as in some mammalian 
cells. 

35 As used herein, the term "gap repair" refers to the process by which a host 

cell (e.g., a yeast cell) repdks double stranded breaks in a DNA molecule through 
homologous recombination. 
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As used herein, the term "homology" refers to a degree of seqimce identity 
between the nucleic acid sequence of two DNA molecules, suJBBcient to aDow 
homologous recombination between the two DNA molecules to occur. The two 
DNA molecules can be, for example, at least 80, 90 or 100% identical. 
5 As used herein, the term "library element encoding region" refers to a nucleic 

acid sequence or molecule which is the functional part of a nucleic acid insert 
molecule, e.g., a reverse transcription product of reverse transcription of an mRNA 
molecule. A library element encoding region can be, for example, a gene or a part 
thereof, e.g., a promoter, a protein encoding region, a translational terminator or a 
10 transcriptional terminator; or an intrag^c sequence, e.g., an intragenic sequence 
which encodes, for example, a transcriptional enhancer or silencer. A library 
element encoding region can be obtained from an existing cDNA library, e.g., a 
plasmid based cDNA Kbrary or a phage based cDNA Kbraiy; an mRNA molecule, 
e.g., an mRNA molecule derived from a tissue, e.g., a cancerous tissue, such as, for 
15 example, prostate cancer tissue; or a DNA molecule, e.g., a naturally occurring DNA 
molecule or a synthetic DNA molecule. 

As used herein, the term "DNA library" refers to a collection of DNA 
molecules, e.g., cDNA, genomic DNA, or synthetic DNA molecules, cloned mto a 
suitable vector. The cloned DNA molecules can be propagated in an appropriate 
20 host cell, e.g., a bacterial cell, and can be used in applications, such as, for example, 
the identification and cloning of novel genes. Examples of DNA hbraries include 
genomic libraries, e,g., a Uver or brain cell genomic library; or cDNA libraries, e.g,, 
a human B cell or liver cell cDNA library. 

The method of the invention is a highly efGcient, rapid j:ost eflF^ti ve 
25 ^tCTn^^etoxurrrat cDNA-Ubraiy coiistraction^ This method allows the 

rapid coiistruction and screemng of small amounts of 

mRNA and it provides aunivereal way to sj^^^ 

that can use all the diffei^nt cD of the 

vectors they are in. ITie method ^o^^^ invention can^ in many applications, replace 
30 OTnyiaitioiid cDNA Hbraiy cons^^ 

Other features and advantages of the invmtion will be apparent from the 
following detailed description, and from the claims. 
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Detailed Desaypiion 
The drawings are first briefly described. 

Brief Description of the Drawings 

Figure 1 is a schematic diagram of the method used for the in vivo 
construction of cDNA libraries (the gap repair process). Linear vector is 
cotransfoimed into yeast together with cDNA that has 5' and 3' end vector sequences . 
added to its corresponding ends. By plating onto selective media, yeast colonies 
appear that have successfully integrated flie cDNA into the plasmid through 
homologous recombination and gap repair. 

Figure 2 is a dq)iction of the results of a stepwise decrease in the size of the 
overlap, from 50bp to 20bp (a-d), between Ae template (Mxil) and the linear vector. 
The number of white yeast colonies increases as the homology is gradually reduced, 
indicating a non-productive gap repair process. 

Figure 3 A is a schematic diagram of tiie commercially available cDNA 
(Marathon-Ready cDNA, Clonetech) used to determine the ^plicability of the 
cDNA cloning process. 

Figure 3B is a depiction of an agarose gel analysis and size characterization 
of the different cDNAs cloned in vivo. The data was obtained from nine randomly 
picked yeast colonies. 

Vectors usually include a backbone and site at which an insert can, or is, 
insCTted. In many cases the insertion site will be flanked by one or more short 
regions which allow for cleavage by a predetermined restriction enzyme. After 
cleavage of the vector with such an enzyme, the vector has single strand overhangs 
which can hybridize with ^propriate single stranded ends on an insert, the single 
stranded ends of which have been formed by cleavage with a predetermined enzyme. 
Preferred vectors are those capable of autonomous replication. Preferred vectors can 
direct expression of inserted nucleic acids. Vectors capable of directing the 
expression of genes to which they are operatively linked are often referred to as 
oqiression vectors. Plasmids, a term which refers gmerally to circular double 
stranded DNA loops which, in thenr vector form are not bound to flie chromosome, 
are usefiil in the methods of the iavention. In the present specification, plasmid and 
vector are used interchangeably as the plasmid is the most commonly used form of 
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vector. However, methods of the invention include such other forms of vectors 
which serve equivalent functions. 

Vectors can include selectable maikers, promote, and nucleic acids which 
mcode proteins which axe to be fused with the protein encoded by an insert 
5 Numerous vectors exist for the expression of DNA libraries, in both 

eukaiyotic and prokaiyotic cells. Examples of such vectors mclude A.gtlO, Xgjtl 1, 
the ZAP series (Stratagene), pESP-1, pOPRSVlMSC and the like. Vectors suitable 
for use in the two hybrid system are described below. In methods of the invention, 
the vector of interest is linearized prior to introduction into the host cell. The vector 

10 can be linearized by cleavage with an appropriate restriction enzyme. The 

procedures concerning the use of restriction enzymes, their nucleotide specificity 
and the appropriate reaction conditions are known to those skilled in the art and 
readily available. The amounts of enzyme and DNA, the buffer and ionic 
concentrations, and the tempotatlire and duration of the reaction will vary depending 

IS upon the specific application as described in Sambrook et al. (Molecular Cloning: A 
Laboratory Manual 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor 
Laboratory Press, Cold Spring Hatbor, NY, 1989), and other laboratory manuals. 

Host Cells 

20 Cells which can support homologous recombinatig i are suitable for use as 

host cells. Such cells include cells which have been genetically engineered to 
support homologous recombination. 

Yeast cells, for example, Saccharomyces cerevisiae or Schizosaccharomyces 
pombe are suitable host cells. Strains of yeast that are of particular interest to the 

25 present invmtion include the two-hybrid system rq[)orter strains Yl S3, containing 
the GAL1-HIS3 and GALl-lacZ reporters and the trpl and leu2 transformation 
markers, (Bartel et aL, Methods in Enzymology, 254:241-263, 1995) CTYl, containing 
file GAL1-HIS3 and GALl-lacZ reporters and the his3, trpl and leu2 transformation 
markers (Chien et al., PNAS 88: 9578, 1991), CTY10-5d, containing the lexA-lacZ 

30 reporters and the his3, trpl , and leu2 transformation markers (Chien et al., PNAS 88: 
9578, 1991), YBP2, contaming the GALl-fflS3 and (GAL17mers)-lacZ rq>orters 
and the trpl and leu2 transformation markers (Chien et al., PNAS 88: 9578, 1991), 
and GGY1::171, containing the GALl-lacZ reporter and the his3 and leu2 
transformation markers (Gill at al.. Cell 51:113, 1987). 

35 Bacterial cells can also be used as host cells. E,coli cells, such as the £ coU 

strains CJ236 (Kunkel et al, 1987), NM522 (Gough and Murray, 1983), 5K and 
TGE7300 (Degryse, 1991b), JMlOl, JM107, KM392 or LE392, which have 



16 



wo 99/40208 



PCT/US99/02591 



recombinational activity can be used Bacillus subtUis cells, which have 
recombinational activity may also be used. A wide variety of mammalian cells, such 
as CHO, COS, C127, and HepG2 cells, as well as certam viruses, in which 
recombination occurs, can also be used. 

Appropriate conditions for die growth of host cells, such as types of media 
(both Uquid and solid), temperature and duration of incubation are known in the art, 
see, e,g,, Sambrook et al. and in "Culture of Animal Cells. A Manual of Basic 
Technique", Freshney R.L, Third Edition, Wiley-Liss 1994. Methods for isolating 
discrete cell colonies or plaques, as well as plasmid DNA fix)m such colonies or 
plaques are known in flie art, and mclude plating the cells on selective media so that 
colonies or plaques are formed, lysing the cells by detergents, removing proteins by 
protease treatment, and purification of plasmid DNA through a CsCl gradient The 
latter step can also be performed using commercially available DNA bmding 
matrices in the form of columns (e.g., Qiagen Kit). 

A nucleic acid insert molecule and vector can be introduced into prokaryotic 
or eukaryotic cells by any suitable methods e.g., by transformation or transfection. 
As used herein, the term "transformation" ref^s to methods for introducing foreign 
nucleic acid molecules (e.g., DNA) into a bacterial host cell. As used herein, the 
term "transfection" refers to methods for introducing foreign nucleic acid molecules 
(e.g., DNA) into a mammalian host cell. Methods for introducing a nucleic acid 
molecule into a host cell include "heat shock" transformation, calcium phosphate or 
calcium chloride co-precipitation, DEAE-dextian-mediated transfection, hpofection, 
or electroporation. For yeast cells, treatment witti lithium acetate or lithium 
chloride, presents another alternative for efficient transfection. Suitable methods for 
transforming or transfecting host cells can be found m Sambrook, et al. 

Library Element Encoding Re gion 

A library element encoding region can be derived from a variety of sources. 
For example, a Ubrary element encoding region can be derived from tissue mRNA, 
an existing cDNA library (e.g. plasmid or phage based) or a naturally occurring or 
synthetic DNA molecule. Only a small amount of the starting material is needed. In 
fact, when the library element encoding region is derived from tissue mRNA, 
mRNA from one or a few cells, (e.g., less flian 10, 100, or 1,000 cells) is sufficient 
to produce a DNA library. This is particularly useful when heterogeneous tissue 
populations are used. Such heterogeneous tissue populations include cancer tissues. 
For exanqile, using laser techniques a few cells can be separated from a cancwous 
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prostate tissue. Using these cells and the methods of the invention, described below, 
cDNA libraries of cancerous prostate cells can be produced r^idly. 

An existing DNA library, e.g., a cDNA library, can also be used as the source 
of library element encoding regions. A wide variety of cDNA libraries are available. 
Methods of the invention allow use of a library, designed for one application, in 
order to produce another library suitable for use in a different appUcation, with very 
Uttle experimental manipulation and effort. This can be achieved by simply using 
primers, e.g., PGR primers, containing a first region homologous with the nucleotide 
sequence in the linkers used during the construction of the existing cDNA library 
and a second region homologous with either a first or a second region in a vector, 
e.g., the terminal ends of the vector, appropriate for a particular application (see 
figure 3A). 

A synthetic DNA molecule can be used as the source of the library element 
encoding region. For example, methods which generate populations of non-identical 
nucleic acid molecules, e.g., PGR with low fidelity Taq polymerase, can be used to 
generate library element encoding regions. These can be used in a two-hybrid 
assay, described below, in order to screen and identify, for example, protems with a 
better affinity for a particular substrate. 

Preparation of Nucleic Acid Insert Molecules 

Nucleic acid insert molecules of the invention include a first region, a library 
element encoding region and a second region. The first and second regions have 
sufficient homology with a vector molecule such that homologous recombination 
can occur between a nucleic acid insert molecule and a vector molecule. The first 
and second regions flanking a nucleic acid insert molecule can be produced by PCR^ 
using primers having a first and second region homologous with a first and second 
region in the vector, respectively. The use of PGR is known m the art and is 
described in U.S. Patent 4,683,202, the contents of which are expressly incoiporated 
herein by reference. The technique is also described in several general sources, see, 
e.g,, Sambrook et al. and "PGR Protocols, A Guide to Methods and AppHcations" 
(bmis et al. eds.). Academic Press, San Diego, GA, 1990, The Taq polymerase 
(Promega) and more preferably, either the Pfu (Stratagene) or Vent (New England 
Biolabs) polymerases can be used. The latter two have a proofi^ading ability and 
can, therefore, eluninate the introduction of errors in the PGR product during 
ampHfication. The resulting PGR products (i.e., the nucleic acid insert molecule), 
can be isolated by agarose or acrylamide gel electrophoresis followed by elution of 
the nucleic acid insert molecule &om tiie agarose or acrylamide matrix. The two 
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most common ways of elution are either soaking in an appropriate buflfer or 
electroelution, both described in Sambiook et al. Botti methods are effective, but 
soaking is often the method of choice because it is inexpensive, easy and can be 
accomplished without monitoring. Kits for flie purification of DNA from gel 
matrices may also be used (e.g., "Compass Kit" by American Bioanalytical). The 
resulting nucleic acid insert molecule, can also be purified using reverse phase or 
anion-exchange HPLC. 

The primer oligonucleotides, used in the PCR reaction, may be synthesized 
using commercially available solid phase oligonucleotide synthesis machines 
(Needham-VanDevanter, D. R., et al.. Nucleic Acids Res,, 12:6159-6168, 1984), or 
chemically synthesized using the solid phase phosphoramidite triester method 
described by Beaucage et al., ( Beaucage et al.. Tetrahedron Letts. 22, No. 20:1859- 
1 862, 198 1), OUgonucleotides are preferably purified prior to use. Purification of 
oligonucleotides can be performed using reverse phase or anion-exchange HPLC 
and may also be carried out by denaturing or native polyacrylamide gel 
electrophoresis. 

The first and second regions of a nucleic acid insert molecule, having 
homology to a vector, can be added to a library element encoding region by the 
ligation of adapters, having a sequence homologous to the terminal ends of the 
vector. As used herein, the temi "adapter" refers to a, preferably s hort, d puble 
stc^dc^^l^s ^ence^ ^ludicm^ to flie m dsofa nothCTDNA moleciJ e. 

The adapter can be a synthetic DNA molecule^ .g., synthesized using a solid phase 
Tdia^to^EffiteJi^teijnEthQ^ oHtca^ be a natural DNA molecule, e.g., 
"Tnoflucwby digestion using the appropriate restriction endonucleases. The adapts* 
can be joined to the library element encoding region by ligation. Taq DNA ligase, 
the E. coli DNA Ugase, or more preferably, T4 DNA ligase can be used. 

The first and se cond regions of the nucleic acid insert molec ule (i.e. the 
regions flanking the library element encoding region) ca n be of any size which 
supports an acceptable fi^ uencv of recombin^ ion. the size of the homologous 
region between the nucleic acid insert molecule and the vector sequences, usually 
being linear to the fi^uency of recombination. How ever, a minimmn of^Qhpjs 
.piefeiredJ^aiefHcientJtB Combination to occur (see Figure 2). Preferably, the first 
a nd second regions of the nucleic acid insert molecule have a length of at least 30, 

40 , 50, or 60 bp. ^ 

Methods of the invention can be used to produce DNA libraries fiom tissue 
mRNA. In such cases, the first and second regions of the nucleic acid insert 
molecule (flanking the library element encoding region) can be added directly during 
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the Ubrary element encoding region synthesis from mRNA. Nucleic acid insert 
molecules can be synthesized from mRNA, using a first primer having a first region 
homologous with the polyT sequence of the mRNA, and a second le^gion 
homologous with a first region in a vector of inta:est First, intact mRNA is 
hybridized to the first primer The mRNA is then copied by reverse transcriptase to 
produce an RNA-DNA hybrid, which can be isolated by standard methods (e.g., 
chloroform extraction and ethanol precipitation). The RNA from the RNA-DNA 
hybrid can be removed with the enzyme RNaseH, and an £. coli DNA polymerase I 
can be added to fill in the gaps and produce a double stranded DNA molecule which 
cont^s in its 3* end flie second region of the first primer (which is homologous with 
a first region in a vector of interest). An ad^ter containing a region homologous to 
a second region in the vector of interest can then be added, e.g-, ligated to, the 5* end 
of the library element encoding region, as described above. The resulting nucleic 
acid insert can then be introduced into an sqppropriate host cell, along with the 
vector, e.g., the linearized vector of interest. 

An existing cDNA library, for example, a phage or plasmid based library, 
can also be the source of the library element encoding region. Existing libraries 
generally have a cDNA or other library element encoding region insrated between a 
first and second common sequence, e.g., a first and a second linker sequaice. In 
such cases, a first and a second oUgonucleotide primer can be designed to contain a 
first region homologous to the common sequence, i.e. the linkers used during the 
construction of the existing cDNA library and a second region homologous with a 
first and a second region in a vector of interest, respectively (see Figure 3 A). These 
primers can be used in a PGR amplification reaction to produce nucleic acid insert 
molecules which contain a first and a second region homologous with a first and a 
second region in the vector of interest, respectively. Both the vector and the nucleic 
acid insert molecule can be introduced into a host cell, as described above. Through 
homologous recombination and gap repair the host allows the nucleic acid insert 
molecule to be inserted into the vector, to thereby produce a new DNA hbrary. 

A population or library of DNAs can be modified in tenns of content. For 
example, the population or library can be enriched for molecules having particular 
sequence motife by amplification or subtractive methods. For example, degenerate 
primers can be used that selectively amplify a particular subset of DNAs, such as 
DNAs which encode proteins with zmc finger moti&, helix-loop-helix domains, 
WW domains, leucine zipper domains, and the like. Oligonucleotide primers can be 
synthesized to contain a first region homologous with a conserved nucleotide 
sequence present in the particular subset of DNA to be amplified, and a second 
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region homologous with either a first or a second region in the vector molecule of 
interest. Such conserved nucleotide sequences are those present, for example, in 
genes which racode protems with zinc fingar motifs (e.g., Cys-Xaa2-Cys-Xaal-3- 
Cys-Xaa2.Cys), (SEQ ID NO: 12) WW domains (e.g., Pio-Xaa-Xaa-Tip-XLiQ-Tip- 
5 Xaa-Xaa-Pro) (SEQ ED NO: 13) or the G protein alpha subunits fix>m cochlear tissues 
(Tachibana et al.. Hear Res 62:82-8, 1992). As described above, these primers can 
be used in a PCR amphfication reaction to produce a nucleic acid insert molecule, 
the nucleic acid insert molecule and a vector molecule can be introduced into a host 
cell, and through homologous recombination and gap repair, the nucleic acid iiasert 

10 molecule can be inserted into the vector, to produce a DNA library. 

j^^^^ A DNA library can also be produced by the introduction of a plurality of 
nucleic acid msert molecules and a vector molecule into a host cell. For example, 
three nucleic acid insert molecules (1-3) can be introduced into the host cell along 
with the vector of interest Each nucleic acid insert molecule has a first and a second 

15 region. The vector also has a first and a second region. The first nucleic acid insert 
molecule has a first region homologous with the first region of the vector and a 
second region homologous with the first region of the second nucleic acid insert 
molecule. The second nucleic acid insert molecule has a first region homologous 
with the second region of the first nucleic acid insert molecule and a second region 

20 homologous with the first region of the third nucleic acid insert molecule. The third 
nucleic acid insert molecule has a first region homologous with the second region of 
the second nucleic acid insert molecule and a second region homologous with the 
second region of the vector. Th e regions are sufficiently homolo gous so as to allow 
homologous recombination and gap,rGpair4D.ocGur-betweeH-4he nncleic-acid iTwerr 

25 molecules and the vector, once these are introduced into a host cell. 

In Vivo DNA libraries and the Two-Hvbrid Assay 

DNA Ubraries produced by homologous recombination and gap repair, e*g., 

30 in yeast, can be used for screening of expressed proteins using the two-hybrid 

system, (described in US. Patent No. 5,283,317 and WO94/10300, the contents of 
which are incorporated herem by reference), in order to identify proteins, which bind 
to or interact with a protein of interest. The two-hybrid system is based on the use of 
a transcription factors, having a "modular** nature, i.e., having separable DNA- 

35 bindmg and activation domains. Briefly, the assay utilizes two different DNA 

constructs. In one construct, the gene that codes for a protein of intCTest (•'bait") is 
fiised to a gene encoding the DNA binding domain of a known transcription factor 
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(e,g,, GAL-4). In the oth^ construct, the cDNA libraiy, which encodes an 
unidentified protein ("prey" or "sample") is fused to a gene which codes for tfie 
activation domain of the known transcription fector. If the "bait" and the "prey" 
proteins are able to interact, in vivo, forming a complex, the DNA-binding and 
5 activation domains of the transcription factor are brought into close proximity. This 
proximity allows transcription of a reporter gene (e.g., LacZ) which is operably 
linked to a transcriptional regulatory site responsive to the transcription factor. 
Expression of the reporter gene can be detected and cell colonies containing the 
functional transcription factor can be isolated and used to obtain the cloned gene 

10 which encodes tfie protein interacting with the protein of interest. 

Examples of yeast vectors which are useful for the methods of the invention 
include the "activation domain" vectors: pGAD.GH, pVP16, pACT, pGAD424, 
pGAD2F and pJG4-5. Important features of these vectors are the ADHl promoter 
which drives the expression of cither the GAL-4 activation domain, the E, coli B42 

15 activator, or the herpes virus VP16 gene, and the ADHl terminator. Also included 
in these vectors, are the 2]li yeast origin of replication, an E. coli origin of 
replication, an E. coli selectable maricer for ampicillin resistance, and yeast 
selectable maricers, such as LEU2 or TRPl. 

The in vivo cloning process can also be used to make DNA libraries for use 

20 in an j^lication other than the two-hybrid assay. Such applications, include 
screening of a DNA library, by hybridization with a nucleic acid or an antibody 
probe in order to clone and identify novel genes. The screening procedure is usually 
performed on bacterial colonies, containing plasmids, or on bacteriophage plaques. 
In the case that the DNA library is constructed in yeast, the yeast colonies can be 

25 pooled and the library plasmids rescued en masse, followmg successful gap repair. 
The plasmids can then be used to transform bactma, plated out and screened using 
radioactive probes or antibodies. 

/ The methods of the invention can be used in the context of the two-hybrid 
system to screen patients (e.g., cancer patients) for lesions in a gene encoding a 

30 particular protein (e.g., Mxil), For example, using tissue fix)m a prostate cancer, a 
set of nucleic acid insert molecules can be produced as described above. These 
nucleic acid insert molecule can then be transformed into a yeast r^rter strain 
along with a vector containing the activation domain of the Max protem. Mxil 
mutants unable to interact with Max will be unable to drive e}q)ression of the 

35 reporter gene, present in the yeast reporter strain, and as a result yeast cells will be 
unable to grow in a particular selective medium. By comparing the growfii on plates 
lacking the selection marker versus flie growth on plates including the selection 
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markar, colonies containing an Mxil mutant can be idratified. The afoie-mentioned 
procedure can be used to screen patients suffering from any state or condition in 
which, a lesion in a gene encoding a particular protein might affect its interaction 
with another protdn. 

5 

Kits 

The invention includes kits which allow the interchangeable use of a DNA 
Ubrary in more than one application. The kits provide primers which allow efficient 
transfer of a Ubrary element encoding region fix)m a first vector to a second vector. 

10 The kits provide primers having a first region homologous with the linker sequence 
used in the construc tion of the DNA libraiy and a second region homologous with 
— ^thffaTirst or a second region in a vector molecule required for a particular 

plication. The kit can include the primers, e.g., arranged according to which DNA 
library or vector they are homologous with, as well as one or more of the following: 

15 bufiors, enzymes, the library inserted in the first vector, the (second) vector into 
which the library is to be inserted, and instructions for use of the kit. The contents 
of the kit can be packaged in a suitable container. 

The kit can include the library in a first vector, and primers for inserting it 
into a second vector. The second vector can also be included. 

20 For example, the kit can provide primers suitable for introduction of existing 

DNA libraries into the pGAD424 vector, so &at the libraries can be screened in a 
two-hybrid assay. In such a case &e PGR primers could have a sequence of 5- 
GAATTCNNNNNNN-3' (SEQ ID N0:9) and 5'-AGATCTNNNNNNN-3* (SEQ ID 
NO:10), where the GAATTC and AGATCT sequences correspond to the EcoRI and 

25 Bgia sites, respectively, present in the polylinker region of the pGAD424 vector, 
and the NNNNNNN sequences correspond to the sequences of the linkers used in 
the construction of an existing DNA library (which can be the same or different). 
These sequences can vary depending on which DNA library is used. For example, 
when the Clonetech human brain (cat# HL4004AH), human bone marrow (cat# 

30 HL4Q22AB), human lymph node (cat# HL4023AB), human fetal liver (cat# 

HL4029AH), or mouse 1 1-day embryo (cat# ML4005AB) MATCHMAKER cDNA 
libraries are used, the NNNNNNN sequences will correspond to: 
AATTCGCGGCCGCGTCGAC (SEQ ID N0:1 1) the nucleotide sequence of the 
BcoRI-Hoi /-Sal / adaptor built in these cDNA libraries. 

35 Primers of the mvention allow small amounts of an existing DNA Ubrary 

constructed for a particular sqppUcation, to be transferred into a different vector 
molecule and/or host cell suitable for another ^plication. A simple PGR. 
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amplification, using the appropriate primers provided in the kit, followed by 
transfection of the nucleic acid insert molecule and the vector molecule into a host 
cell would result in the production of the desired DNA libraiy. 

5 The following examples which fiirther illustrate the invention should not be 

construed as limiting. 

Examples 

10 1 . repair cloning using different sizes of overlap of DNA sequences between the 
MXIl qPNA gnd thg pJQ-4.S yg^t VQCtpr. 

In this example, the minimum overls^ homology required for successfiil gap 
repair cloning was determined. The Mxil cDNA (Zervos A.S. et al. Cell^ 75:223- 

15 232» 1993), coding the short fonn of Mxil protein of 191 amino acids, cloned 
unidirectiohally into the EcoRI/XhoI sites of the pJG 4-5 yeast expression vector 
(pTZlO.l) was used. The pJG 4-5 vector (see Figure 1) contains a nuclear 
localization sequence to maximize intranuclear concentration, the B42 transcription 
activation domain, a hemagglutinin epitope (HA) to facilitate detection, an ADHl 

20 transcription terminator, a 2|i origin of replication, a TRP 1"^ selectable marker, and a 
GALl inducible promoter that drives toe expression of the chimeric gene. 

Using different PCR primers, increasing stretches of vector flanking 
sequence were added to both the 5' and 3' ends of the Mxil cDNA. The 5* primers 
corresponded to the sequmce of the HA tag and the GAL-1 promoter and the 3' end 

25 primers encoded sequence fiom the ADH terminator. Primers that added 50 (SEQ 
ID NOs: 1 and 2), 40 (SEQ ID N0s:3 and 4), 30 (SEQ ID N0s:5 and 6) and 20 (SEQ 
ID NOs:7 and 8) bp of vector sequence to the ends of the Mxil cDNA were used. 
The Mxil PCR product was then transfected into yeast used in a modified two- 
hybrid system (Gyuris J. et al. Cell, 75:791-803, 1993) together with the pJG4-5 

30 plasmid that had been linearized using EcoRI and Xhol restriction enzymes. The 
yeast were plated onto selective plates lacking Ura-His-Trp- and two days later 
colonies appeared. 

Successfiil gap repair was monitored by plating yeast on X-gal plates. Mxil 
that has been IsuccessfiiUy incorporated into the yeast e3q>ression plasmid will form 
35 an Mxil fusion protein which wiU interact with the LexA-Max bait, already present 
in the yeast strain, and the yeast will turn blue. Incomplete gap repair will lead to an 
Mxi 1 sequence out of fi:ame with the vector sequence, and the yeast colonies will 
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appear white. Figure 2 shows the results obtained from this assay. The number of 
white yeast colonies increases as the size of the overlap is reduced 6om SO bp to 20 
bp (a-d) between the template (Mxil) and the linear vector, indicating a 
non-productive gap repair process. These results show that a minimum overlap 
5 homology of 30 bp on both the 5' as well as the 3' end of the tc^nplate and the linear 
plasmid should be used for successful gap repair cloning. 

PCR amplification and addition of the flanking sequences was performed 
essentially as follows. The oligos used for PCR were: 

10 5' GAG ATG CCT CCT ACC CTT ATG ATG 3' -50 (SEQ ID NO:l) 
y GAT TGG ACA CTT GAC CAA ACC TCT 3' +50 (SEQ ID N0:2) 

5' CTA CCC TTA TGA TOT GCC AGA TTA 3' -40 (SEQ ID NO:3) 
5* TTG ACC AAA CCT CTG GCG AAG AAG 3' +40 (SEQ ID NO:4) 

15 

5' GAT GTG CCA GAT TAT GCC TCT CCC 3V-30 (SEQ ID NO:5) 
5' CTC TGG CGA AGA AGT CCA AAG CTT 3' +30 (SEQ ID N0:6) 

5* GAA GTC CAA AGC TTG AG 3' +20 (SEQ ID NO:7) 
20 5' ATT ATG CCT CTC CCG 3' -20 (SEQ ID NO:8) 

The 5' end corresponds to the DNA sequence upstream of the EcoRI cloning 
site of the pJGS-4 and encodes part of the transcription activator and the HA epitope 
tag. The 3' end corresponds to part of the ADH terminator sequence of the vector. 

25 PGR was perfonned using 10 ng of pTZ-Mxil as template and 100 ng each of the 
two primers in a 50 ^il reaction volume. A program of 24 cycles was used consisting 
of: 30 seconds at 94^ C, 1 minute at 65° C and 1 minute at 72° C- The PCR product 
was gel purified, ethanol precipitated, resuspended in 50 jil TE and 5 |il was used 
along with 100 ng of linear plasmid to transform yeast using a variation of the 

30 lithium acetate method (Ito H. et al. Jl Bacteriol, 153:163-168, 1983). 

2. Preparatio n of cDNA for in vivo Cloning 

35 30 bp of vector flanking sequences were added to both the 5' and 3* ends of a 

commercially available cDNA library ^arathon-Ready cDNA, Clontech, Ca# 
7440-1) by PCR, using primers [5' GAT GTG CCA GAT TAT GCC TCT CCC 
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GAA TTC GCC GCC CGG GCA GGT 3*] (SEQ ID N0:9) and [5' CTC TGG CGA 
AGA AGT CCA AAG CTT CTC GAG TTC TAC A AT THA GTG V] (SEQ ID 
NO:10). Underlined regions of these primers are conxplementary sequences to the 5* 
and 3' ends of the Imkers used during synthesis of the cDNA (Clontech) and the rest 
5 corresponds to the flanking DNA sequence 5' and 3* of the EcoRI and Xhol cloning 
sites of the pJG4-5 vector. PGR was performed in 50 ^il reactions containing 5 fil of 
the cDNA and 250 ng of each of the two primers, using a PTC-100 TM (MJ 
Research-Inc.) cycler, programmed for 3 minutes at 94*^ C, followed by 30 cycles 
consisting of 30 seconds at 94° C, 30 seconds at 56"* C and 3 minutes at 68"^ C. The 

10 PCR ampUfied cDNA was ethanol precipitated, resuspended in the original volume 
with TE buffer and different amounts were used along with 0,5^g of linear pJG4-5 
vector to transform yeast strain. Using 0.5^g of vector, maximum transformation 
efficiency was obtamed with 10-15^1 of the PCR amplified cDNAs. 

Colonies appeared after two days. Linear vector alone gave very few 

15 colonies whereas transf(^ationeflBciencies greater than 10^ per ^g 

were obtained wifli gap vector and cDNA. Several independent yeast colonies were 
isolated, grown overnight in Uquid media and used to extract the pJG4-5-cDNA 
plasmid. ^ 

20 3. Characterization of cDNAs Isolate d After in vivo Cloning 

In this example, the number of copies and the size of the dififermt cDNAs 
cloned in vivo was characterized. Each yeast clone contained only a smgle plasmid 
which represented a successfiil gap repair of a unique cDNA and the vector. 

25 Nine yeast colonies were randomly picked, grown overnight in liquid media 

and used to extract the pJG4-5-cDNA plasmid by standard procedures (Zervos A.S. 
et al. Cell, 75:223-232, 1993). Using primers [5' GAT GTG CCA GAT TAT GCC 
TCT CCC 3' -30] (SEQ ID N0:5) and [5' CTC TGG CGA AGA AGT CCA AAG 
CTT 3' +30] (SEQ ID NO:6), flanking the cDNA, the inserts were amphfied by 

30 PCR, digested with restriction enzymes and analyzed on a 1% agarose gel The 
clones had cDNA mserts varying in size fix)m 300 bp to 2.3kb (see Figure 3, tiie 
marker used was a lambda DNA-BstEH digest). This result shows that the in vivo 
cloning method of the invention does not preferentially clone a particular size of 
cDNAs. All nine mserts were partially sequenced and found to represent diffident 

35 distinct cDNAs. 
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Equivalents 

Those skilled in the art will recognize, or be able to ascertain, using no more 
than routine experimentation, many equivalents to the specific embodiments of the 
invention described herein. Such equivalents are intended to be encompassed by the 
following claims. 
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CLAIMS: 

1 . A method for coiistmcting a DNA library in vivo^ comprising: 
providing a plurality of host cells; 

providing a vector having a first region and a second region; 

providing a nucleic acid insert molecule having a first common 
region v^hich is homologous with said first region of the vector, a second common 
region w^hich is homologous with said second region of the vector, and a library 
element encoding region disposed between said first common region and said 
second common region, wha:ein when the hbrary element encoding region encodes 
a naturally occurring sequence, the first and second regions are not naturally found 
adjacent to the library element encoding region; 

introducing a vector molecule into each of the host cells; 

introducing a nucleic acid insert molecule into each of said cells, 
wh^eui a difierent library el^rat encoding region is mtroduced into each of said 
cells; and 

allowing homologous recombination and gap repair betwem a vector 
molecule and a nucleic acid insert molecule to occur, 
thereby constructing a DNA library. 

2. A method of preparing a plurality of nucleic acid insert molecules, 
comprising: 

providing a plurality of nucleic acid molecules wherein each of the 
nucleic acid molecule includes, in order finom 5' to 3', a first common sequence, a 
library element encoding region, and a second common sequence; 

providing a pluraUty of first primers, each of said first primers having 
a first region homologous with the first common sequence of the nucleic acid 
molecule and having a second region which is not homologous with said first (and 
preferably second) conmion sequence; and 

providing a plurality of second primers, each of said second primers 
having a first regioiThomologous with the second common sequence of the nucleic 
acid molecule and having a second region which is not homologous with said 
second (and preferably first) common sequence; 

forming a reaction mixture which includes said plurality of nucleic 
acid molecules, said plurality of said first primers, and said plurality of said second 
primes, under conditions which provide a plurality of nucleic acid insert molecules 
having the following structure, in ordo: fiom 5' to 3', a second region of said first 
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primer/said first common region/a library element encoding region/said second 
common region/a second region of said second primer, 

thereby preparing a plurality of nucleic acid insert molecules. 

5 3, A method ofconstmcting a DNA library, comprising: 

providing a plurality of nucleic acid molecules wherein each of said 
nucleic acid molecule includes, in order from 5' to 3*, a first common sequence, a 
Ubrary element encoding region, and a second common sequence; 

providing a plurality of first primers, each of said first primers having 
10 a first region homologous with the first common sequence of the nucleic acid 

molecule and having a second region which is not homologous with said first (and 
preferably second) conunon sequence; 

providing a pluraUty of second primers, each of said second primers 
having a first region homologous with the second common sequence of the nucleic 
1 S acid molecule and having a second region which is not homologous with said 
second (and preferably first) common sequence; 

forming a reaction mixture which includes said plurality of nucleic 
acid molecules, said plurality of said first primers, and said plurality of said second 
primers, under conditions which provide a plurality of nucleic acid insert molecules 
20 having the following structure, in order fix>m 5' to 3*, a second region of said first 
primer/said first common region/a library elemmt encoding region/said second 
common region/a second region of said secoiul primer, 
providing a plurality of host cells; 

providing a vector having a fiarst region which is homologous with 
25 said second region of said first primer, and a second region which is homologous 
with said second region of said second primer; 

introducing said vector molecule into each of said host cells; and 
introducing one or more of said nucleic acid insert molecules into 
each of said cells, 
30 thereby providing a DNA library. 

4. The method of claim 3, fiirther comprising allowing homologous 
recombination and gap repair between said vector molecule and said nucleic acid 
insert molecule to occur. 



35 



5. The method of claim 3, wherein said first and second common 
sequences are the same. 
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6. The method of claim 3, wherein said first and second common 
sequences are different. 

5 7. The method of claim 3, wherein said host cell is a yeast cell. 

8. The method of claim 3, wherein said host cell is a bacterial cell. 

9. The method of claim 3, wherein said vector is linearized prior to 
10 being introduced into said host cell. 

10. The method of claim 9, wherein said vector is linearized by cleaving 
between said first and second regions of said vector. 

15 11. The method of claim 3 , wherein said second region of said nucleic 

acid insert molecule is produced by PGR, using primers having a first region which 
is homologous to the 3' end of tiie element encoding region and a second region 
which is homologous to the second region of the vector. 

20 12. The method of claim 3, wherein said first region of said nucleic acid 

insert molecule is produced by PGR, using primers having a first region which is 
homologous to the S' end of the element encoding region and a second region which 
is homologous to the first region of the vector. 

25 13. The method of claim 3, wherein said second region of said nucleic 

acid insert molecule is produced by the hgation of adapt^ having a sequence 
homologous to the second region of the vector. 



14. The method of claim 3, wherein said first region of said nucleic acid 
30 insert molecule is produced by the hgation of adapters having a sequence 

homologous to the first region of the vector. 

15. The method of claim 3, wherein said first and second regions of said 
nucleic acid insert molecule are at least 30 base pairs in length. 

35 ^ ' 

16. The method of claim 3, wherein said first and second regions of said 
nucleic acid insert molecule are at least 40 base pairs in length. 
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1 7. The method of claim 3, wherein said first and second regions of said 
nucleic acid msert molecule are at least 50 base pairs in length. 

18. The method of claim 3, wherein said library element encoding region 
is obtained from a cDNA library other than the one being constructed. 

19. The method of claim 1 8, wherein said library element encoding 
region is obtained from a cDNA library which is plasmid based . 

20. The method of claim 1 8, whorein said library element encoding 
region is obtained from a cDNA library which is phage based. 

21. The method of claim 3, wherein said library element ^coding region 
is obtained from an mRNA molecule. 

22. The method of claim 21, wherein said mRNA molecule is derived 
fix)m a cancerous tissue. 

23. The method of claim 3, whmin said DNA library is screened in a 
two-hybrid system and wherein said vector mcludes a transcription factor activation 
domain. 

24. The method of claim 23, wherein said method fiuther comprises, 
introducing into said host cell a nucleic acid molecule encoding a 

hybrid protein, wherein the hybrid protein comprises a transcription factor DNA- 
binding domain attached to a test protein; 

mtroducing into said host cell a detectable gene, wherem said 
detectable gene comprises a regulator site recognized by said DNA-binding domain 
and wherein said detectable gme expresses a detectable protein when said test 
protein mteracts with a protem encoded by the DNA library; 

plating said host cell onto selective media; and 

selecting for said host cell containing a DNA encoded protein which 
interacts with test protein. 

25. The method of claim 3, wherein said DNA library is used for 
screening and cloning of novel genes. 
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26. A method of constructing a DNA library for screening in a two- 
hybrid system, conq)rising: 

providing a plurality of nucleic acid molecules, wherein each of the nucleic 
5 acid molecule includes, in order bom 5' to 3', a first common sequence, a library 
element encoding region, and a second common sequence; 

providing a plurality of first primers, each of said first primers having a first 
region homologous with said first common sequence of said nucleic acid molecule 
and having a second region which is not homologous with said first (and preferably 
10 second) common sequence; 

providing a plurality of second primers, each of said second primers having a 
first region homologous with said second common sequence of said nucleic acid 
molecule and having a second region which is not homologous with said second 
(and preferably first) common sequence; 
15 forming a reaction mixture which includes the plurality of nucleic acid 

molecules, the plurality of said first primers, and the pluraUty of said second 
primers, under conditions which provide a plurality of nucleic acid insert molecules 
having the following structure, in order fiiom 5' to 3', a second region of the first 
primer/the first common region/a Ubrary element encoding region/the second 
20 common region/a second region of the second primer; 

providing a plurality of host cells; 

providing a vector having a first region which is homologous with the 
second region of the fiirst primer, and a second region which is homologous with the 
second region of the second primer, whorein said vector fiuther includes a 
25 transcription factor activation domain; 

introducing a vector molecule mto each of said host cells; 

introducing one or more of the nucleic acid insert molecules into each of said 
cells under conditions which allow for recombination and gap repair to occur; 

introducing into said host cell a nucleic acid molecule encoding a hybrid 
30 protein, wherein the hybrid protein includes a transcription factor DNA^binding 
domain attached to a test protein; 

introducing into said host cell a detectable gene, wherein said detectable gene 
comprises a regulator site recognized by the DNA-bmdmg domain and wherein said 
detectable gene eaqjresses a detectable protein when the test protein interacts with a 
35 protein encoded by flie DNA Ubrary; 

plating said host cell onto selective media; and 
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selecting for said host cell containing a DNA encoded protein which interacts 
with test protein. 

27. A kit allowing the interchangeable use of a DNA library in more than 
5 one £q)plication, comprising: 

a plurality of first PCR oligonucleotide primers, each of said first 
PGR primers having a first region homologous with the first common sequence used 
in the construction of said DNA Ubrary, and a second region homologous with a first 
region of a vector required for a particular ^plication; 
10 a plurality of second PCR oligonucleotide primors, each of said 

second PCR primers having a first region homologous with the second common 
sequence used in the construction of said DNA library, and a second region 
homologous with a second region of a vector required for a particular application; 
and 

1 S mstructions for use. 

28. An oligonucleotide primer having a first region homologous with a 
linker sequence used in the construction of a DNA Ubrary, and a second region 
homologous with an insertion region of a vector required for a particular application. 

20 

29. A method for screening a subject for the existence of a lesion in a 
gene encoding a particular protein, comprising: 

obtaining a tissue sample fiiom said subject; 
preparing 6om said tissue, a plurality of nucleic acid insert 
25 molecules having a first region, a library element encoding region and a second 

region, wherein said library element encoding region encodes said protein or portion 
thereof; 

providing a vector having a first region which is homologous to the 
first region of said nucleic acid insert molecule and a second region which is 
30 homologous to the second region of said nucleic add ins^ molecule, wherem said 
vector is suitable for use in an assay which detects the interaction between two 
proteins; 

providing a host cell suitable for use in an assay which detects the 
interaction between two proteins; 
35 introducing into said host cell said nucleic acid insert molecule, and 

said vector; 
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performing said assay which detects the intCTaction between two 

pioteins» 

thereby screening subjects for the existence of a lesion in a gene 
encoding a particular protein. 

30. The method of claim 29, wherein the plurality of said nucleic acid 
msert molecules are prepared by PCR using a jQrst and a second primer, said first 
primer having a first region comprising said first region of said nucleic acid insert 
molecule and a second region homologous with a sequence in the library element 
encoding region, and said second primer having a first region comprising said 
second region of said nucleic acid insert molecule and a second region homologous 
with a sequence m the Ubrary element encoding region. 

3 1 . The method of claim 29, wherein said assay is a two-hybrid assay. 
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SEQX7ENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Zervos, Antonis S. 
(ii) TITLE OF INVENTION: In Vivo Construction of DNA Libraries 
(iii) NUMBER OF SEQUENCES: 13 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: LAHIVE & COCKFIELD, LLP 

(B) STREET: 28 State Street 

(C) CITY: Boston 

(D) STATE: Massachusetts 

(E) COUNTRY: USA 

(F) ZIP: 02109 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPtJTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Attorney, Louis Myers 

(B) REGISTRATION NUMBER: 35,965 

(C) REFERENCE/DOCKET NUMBER: MGP-063-1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617)227-7400 

(B) TELEFAX: (617)742-4214 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GAGATGCCTC CTACCCTTAT GATG 
24 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

GATTGGACAC TTGACCAAAC CTCT 
24 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CTACCCTTAT GATGTGCCAG ATTA 
24 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

TTGACCAAAC CTCTGGCGAA GAAG 
24 

5 

(2) INFORMATION FOR SEQ ID N0:5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



15 



30 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 5: 

20 GATGTGCCAG ATTATGCCTC TCCC 
24 

(2) INFORMATION FOR SEQ ID NO: 6: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CTCTGGCGAA GAACTCCAAA GCTT 
24 

40 (2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 
45 (c) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GAAGTCCAAA GCTTGAG 
55 17 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOIiOGY: linear 

10 (ii) MOLECULE TYPE: cDNA 



15 



20 



30 



35 



45 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

ATTATGCCTC TCCCG 
15 

(2) INFORMATION FOR SEQ ID NO: 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9; 
GAATTCNNNN NNN 

{21 INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 base pairs 
f 40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



50 AGATCTNNNN NNN 
13 
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(Xi) SEQUENCE DESCRIPTION: SBQ ID N0:11: 
AATTCGCGGC CGCGTCGAC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: protein 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /note= "Xaa is between 1 and 3 

amino 

acids'* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Cys Xaa Xaa Cys Xaa Cys Xaa Xaa Cys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: protein 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /note= "Xaa is between 1 and 10 

amino 

acids " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Pro Xaa Xaa Trp Xaa Trp Xaa Xaa Pro 
1 5 
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